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Abstract. In an attempt to classify all of the overlap-free morphisms constructively using the Latin-square 
morphism, we came across an interesting counterexample, the Leech square-free morphism. We generalize 
the combinatorial properties of the Leech square-free morphism to gain insights on a larger class of both 
" ^ . overlap- free morphisms and square- free morphisms. 

(N 

1. Introduction 

The study of overlap-free words and their generators was originated by Axel Thue in 1912 [9 . Thue 
stumbled across overlap-free words in the attempt to find infinite words that are cube-free. We quickly note 
that XXX is a cube where X is some string of symbols, and a word W avoids cubes if there is no subword 
XXX in W . We know the infinite binary word that avoids cubes to be the Thue-Morse infinite word, 

01101001100101101001011001101001 . . . 



[S]. This infinite word can be generated by function composition of the Thue-Morse morphism ^, on the 
^■f^ ' letter 0. Note that the Thue-Morse morphism is defined as 
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Further we define a morphism as a mapping /i : S* A* with S, A being alphabets such that for any 
two words V,W G S*, we have h{VW) = h{V)h{W). A morphism h is called cube free provided h{W) is 
cube-free if and only ii W € T,* is also cube-free. 

Our primary concern however is dealing with overlaps instead of cubes. An overlap is the pattern cXcXc 
where c represents a single letter and X is a word with possibly zero letters. The standard example of a 
word that is an overlap in its entirety is "alfalfa" , and an overlap-free word is a word in which no overlap 
occurs. 

A morphism h is said to be overlap free so long as we have A g E* overlap-free if and only if h{X) is 
overlap-free. Surprisingly it is known that /i and its natural complement are the only non-trivial overlap-free 
morphisms on the two letter alphabet {—0, 1} [2]- 

In the early 80's Crochemore, Ehrenfeucht, and Rozenberg made substancial progress towards classifying 
the square- free morphisms [3], [4]. Further in 2004, Richomme and Wlazinski published a result classifying 
all overlap-free morphisms [8]. However, their result much like the results of Crochemore, Ehrenfeucht, and 
Rozenberg rely on test-sets of words for the morphism in question. Furthermore, the tests for Richomme 
and Wlazinski grow factorially with the size of the input alphabet. 

Since the late 1990's and early 2000's, several results have surfaced pursuing a constructive understanding 
of the class of overlap-free morphisms. In 2001, Frid suggested using the structure of the cyclic group of 
order n to define each image word accordingly for a morphism on an alphabet of n letters [5]. In 2007 we 
extended Frid's result to the use of the Latin-square structure to define our morphism structure [10] . 

In a vain attempt to use the Latin-square morphism construction to classify all of the overlap-free mor- 
phisms, we stumbled across the Leech square- free morphism in [I]. The following is the Leech square- free 
morphism 
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0121021201210 for t = 



h{t) 



1202102012021 for t = 1 



2010210120102 for t = 2. 



which originally appeared in [B]. This lead us to the definition of the morphism with unstackable image 
words. Note that the definition depends upon a combinatorial property and is not entirely constructive. We 
have yet to overcome this problem. 



We will use the standard definitions from the Lothaire book on combinatorics on words for our definitions 
with a few additions [7^. 

We begin by defining an alphabet S to be a finite set of symbols from which we will make words by 
concatenation (note, we will use capital Greek letters for alphabets). Further, we define a word W to be 
a list of symbols from any alphabet E written horizontally (we will use capital letters to denote words and 
lower case letters to denote letters). We will denote the word with no letters, that is the empty word, by e. 

2.1. Words. The length (or number of letters) for a word W will be written \W\. Note that we will use the 
same symbol to represent the size of a set or absolute value. The difference will be clear based on context. 
Notice that \e\ — 0. Further we will represent \W\a to represent the number of times the letter a occurs in 
W. Also we will use |VF|aba to represent the number of times the word aba occurs in W. For example if 
C = ahaahaha, then we have |C|aba = 3 along with |C| = 8. 

A word [/ is a factor of a word V if there exist two (possibly empty) words S and T such that V — TUS. 
We will also say that C/ is a subword of V (or V contains U). If T = e, then we call U the prefix of V . 
Similarly, if S* = e, then we call U the suffix of V . 

For some alphabet E, E* is the Kleene closure of our alphabet. That is, E* is all of the possible words 
over the alphabet E. Notice that E* is the free monoid over the set E. 

2.2. Morphisms. A morphism /i is a mapping from E* into A*, where E and A are alphabets, such that 
h{WV) = h{W)h{V) for aU words F e E*, and /i(e) = e. Note that W and V could potentially be single 
letters. We note that if X C E {X represents a set of words) for some alphabet E, h{X) represents the set 
of words {h{W) : W £ X}. Further, we call h non-erasing if for all a £ E, where E is an alphabet, h{a) ^ e. 

Recall from earlier that the Thue-Morse morphism, /i defined as 



is a morphism defined on the alphabet with two letters. For convenience, we will call the alphabet with n 
letters E„. Infinite words are possible with such a morphism. We have displayed the n}^ Thue-Morse word 
as being //'(O). We will use lo to represent the first infinite ordinal. So the Thue-Morse infinite word becomes 



as previously seen. Note that we will use bold capitol letters to represent infinite words, with T here 
representing the Thue-Morse infinite word. 

When discussing E2 = {0, 1}, the two letter alphabet we will use to denote the complement of (or 1 
if we need that complement). That is = 1 and 1 = 0. This will become necessary in Chapter 2. 

For some morphism ft : E* — > A*, we will call h uniform if \h{a)\ = n for some integer n for all a G E 
(more exactly, in this case we will call h n- uniform). We will call a morphism /i : E* — > A* square-free when 
h{W) is square-free if and only if X e E* is square-free. Similarly, we will call /i : E* — > A* an overlap-free 
morphism when h{W) is overlap-free if an only if £ E* is overlap-free. 



2. Preliminaries 




for t = 



for t = 1, 



T = lim ^"(0) = ^^"(0) 
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3. The Morphism With Unstackable Image Words 

In a vain attempt to classify all of the overlap-free morphisms using the latin square morphism jlOj , we 
stumbled across the Leech square-free morphism in 1 . The following is the Leech square-free morphism 

{0121021201210 for < = 
1202102012021 for < = 1 
2010210120102 for t = 2, 

which originally appeared in [5]. Noticing that this morphism was overlap- free put a hole in our attempt 
to classify all of the overlap-free morphisms using Latin square morphisms. But on the other hand, we now 
could potentially find another class of overlap-free morphisms that could be explained in a better manner 
than with test-sets as in [8^ . 

Using the test-set result given by Richomme and Wlazinski, we found the following overlap-free morphisms 
on four letters 



fit) 



and 



{01231230103213210 for t = 

12302301210320321 for t 1 

23013012321031032 for t = 2 

30120123032102103 for t = 3, 



{012301221211203210 for i = 

123013003033010321 for i = 1 

230120123310221032 for t = 2 

301230110100132103 for f = 3. 



The morphism g raised a considerable number of questions as to why it was overlap-free. It seemed to avoid 
a considerable number of the techniques used in the proof For the Latin square morphisms. So the natural 
question was: what does the morphism g have in common with the Leech square-free morphism that causes 
its overlap-freeness. 

4. Definitions and Theorems 

The overlap-free morphisms displayed above are tied together with the following definition. 

Definition 4.1. Let /i : E* — ^ A* be an n- uniform morphism. We say that /i is a morphism with unstackable 
image words if it satisfies the following properties: 

(i) h{W) is overlap-free for all overlap-free words T4^ G S* with \W\ — 3. 

(ii) For a, 6 e S, and for all V" G S* such that \V\ < [n/2j , 

h{a) = SV and h{b) = VU 
if and only if S is not a suffix of any image word of h and U is not a prefix of any image word of h. 



We now prove a lemma that captures the combinatorial properties in the first portion of Definition 14.11 

Lemma 4.2. Let E be an alphabet with more than one letter. Let h : T,* A* be a morphism such that 
h(W) is overlap-free for all overlap-free W T,* with \ W\ = 3. We then have the following properties: 

(i) h(a) is overlap-free for all a G E. 

(ii) h{a)h{b) is overlap-free for all a,b ^Y,. 

(iii) h{a) and h(b) do not begin or end with the same letter, whenever a,b T, and a ^ b. 

Proof, (i) Let us first state that the result does not apply when |S| = 1 because there are no overlap- free word 
of length three for this alphabet. For |E| > 1 this result is clear because if we assume for a contradiction 
that h{a) contained an overlap for any a G E, then h(bab), with b ^ a, would contain an overlap which 
contradicts our assumption. 
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(ii) Similar to (i), if we assume tliat h(a)h{b) = h{ab) contained an overlap, then h{aba) would contain an 
overlap. Again this contradicts our assumption. We also must show that h{aa) does not contain an overlap. 
Assume for a contradiction that it does, and we quickly obtain our contradiction by observing that then 
h{aab) must contain an overlap. 

(iii) Assume that for some a, 6 € S, h{a) and h{b) begin with the same letter. Then, h{aab) would contain 
an overlap, which contradicts our assumption. The argument for h{b) and h{b) ending with different letters 
is similar. □ 

Theorem 4.3. Any morphism with unstackable image words is overlap-free. 

Proof. We begin by assuming that h is a morphism with unstackable image words with \h{a)\ = n for all 
a £ T,. We must show that for all W £ T,*, W is overlap-free if and only if h{W) is overlap-free. We will 
begin with the easy direction first. 

4.1. The <j= direction. Assume that W = AcXcXcB, so that we can argue by contrapositive that h{W) 
must also contain an overlap. Notice that 

h{W) = h{A)h{c)h{X)h{c)h{X)h{c)h{B). 

Set h{c) = dY where d G S and Y e S*, then h{W) = h{A)dYh{X)dYh{x)dYh{B). So then h{W) contains 
the overlap dYh{X)dYh{X)d, and we are done with the first portion of our argument. 

4.2. The ^ direction. Conversely we will argue by contrapositive. We will assume that h{W) contains an 
overlap and show that W must also contain an overlap. So assume that for some G S* we have 

h{W) = Acj^Xcj^Xcj^B, 

where c = Cj^ = Cj^ = cj^. We use the 0, 1 and 2 to denote which c we will refer to. Fm-thc!r. the index jj 
will refer to which letter in the word h{W) we are referring to, noting that we are indexing beginning with 
0. 

We will proceed with two separate arguments. The first argument will be that it is not possible to write 
h{W) with \cX\ ^ (mod n). The second argument will be that W must contain an overlap if \cX\ = 
(mod n). 

4.2.1. The \cX\ ^ (mod n) case. Notice that we must have the overlap in h{W) contained in h{Z) where 
\Z\ > 3 is some subword of W. Otherwise we would be breaking hypothesis (i) in the definition of Pooh 
morphisms. 

We begin by setting 

ri=ji (mod n), 

where i G {0, 1, 2} and rj G {0, 1, . . . , n — 1}. We will argue first based on the number of tiles that the overlap 

occurs in, and then by cases. When the overlap occurs over four tiles (noting that occurring over three tiles 
contradicts the hypothesis), we will observe four cases. The cases are 

ro < r2 < ri, 

r2 < ro < ri, 
n <ro < r2, 
ri < r2 < ro. 

We note that the cases tq <ri < r2 and r2 <r\ < tq force the overlap to occur in a number other than four 
tiles. When the overlap occurs in more than four tiles wc will more simply consider the two cases ro < ri 
and ri < tq. Finally, we note the following relationship between ro, ri, and r2. 

(1) r2 = 2ri — ro (mod n). 

Consider the notion of the tiling of a line segment. We will use this notion of tiling in application to 
working with h{W). The tiles we speak of are the image words of h. Note that all the image words must be 
of the same length n, this is crucial to our argument. For ease we will use Tg. with i G {0, 1,2} to denote 
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the tile containing cj- . Note that Si is the number of the tile if we numbered them starting with the first tile 
as To. 

The overlap is contained in 4 tiles. Let us consider the case where there is some subword of W, say Z, 
with \Z\ = 4 and the overlap in h{W) is contained in h{Z). As in the argument for a Latin square morphism 
to be overlap-free we will consider the word h{Z) to be a line. We will draw in small vertical lines to signify 
the edges of the tiles, and we will draw taller labeled vertical lines to signify the c's in the overlap. 





Figure 1 . The short overlap with r2 < tq < ri 

In Figure [TJ we have taken Cj^Xcj-^Xcj^ and written it twice aligning Cj^Xcj-^ in the upper line with 
Cj^Xcj^ in the lower line for the purpose of equating the terms through the overlap. Figure [T] displays the 
case when r2 < tq < ri. We remark here that the case when r2 = tq < ri proceeds in the same manner. 

Let V to be the final ri — ro letters in the tile Tsg , as we have drawn in Figure [1] Similarly we choose U 
to be the first ri — r2 letters in T^^ . Now equation ([T]) gives that in the r2 < ro < ri situation we have that 
n — {ri — ro) = ri — r2. Clearly then we must have \U\ = ri — r2 < ln/2\ or \V\ = ri — ro < \n/2\. In the 
case when \V\ < [n/2\ we cannot equate U with any prefix of a tile which leads to a contradiction. In the 
other case when \U\ < [n/2\ we cannot equate V with any suffix of a tile which leads to a contradiction. So 
this case is not possible. 

We now consider the case where < ^2 < ri as shown in Figure [2] 



V 



u 



V 



u 



^32 



Figure 2. The short overlap with ro < r2 < ri 



In this case we choose V to be the final ri — ro letters in Tg,,, and we also pick U to be the first ri — r2 
letters in Tg^ as drawn in Figure [2] Again we notice that n — (ri — ro) = ri — r2 so either V < [n/2\ or 
\U\ < [n/2\, either of which is impossible. So we cannot have this case occurring either. 

We now consider the cases with ri < r2 < ro and ri < ro < r2. Figure [3] gives the situation when 
ri < r2 < ro (note that the case when ri < r2 = ro is similar, and the same applies to the arguments above). 
Notice that in both of these cases we have that n — (r2 — ri ) = ro — ri . 

For the case depicted in Figure |3l ri < r2 < ro, we assume that V is the final ro ^ letters in T^^, and 
we also assume that U is the first r2 — ri letters in Tg^- Now either \U\ < [n/2\ or \V\ < [ri/2j. In either 
case we have a contradiction. 

Now we consider the case where ri < ro < r2, which is displayed in Figured 

In the case displayed here in Figure 21 we again assume that V occurs in the final ro — ri letters of T^j, 
and we also assume that U occurs in the first r2 — ri letters of Tg^- We then have that either \V\ < [n/2j 
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V U 



Figure 3. The short overlap with ri < r2 < ro 
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^32 



Figure 4. The short overlap with ri < < r2 

or that \U\ < \n/2\. Either case is a contradiction. So we cannot have our overlap occurring in four tiles. 
Thus, we consider the case when the overlap occurs in more than four tiles. 

We also not that if we are in the case when the overlap occurs in five tiles, the same arguments hold. 

The overlap is contained in more than four tiles. We will look at the cases with Tq < ri and ri < ro, and 
we will only look at the beginning of the overlap. So we consider Figure [5] for the case when ro < ri. 




Figure 5. The long overlap with ro < ri and ri — ro > [n/2\ 

We will consider the case with ri — ro > \n/2\, as we will cover the logic behind the argument for 
ri — To < [n/2\ in Figure El 

Let V be the final ri — ro letters in Tsg, then equating yields V as the beginning ro — ri letters of Tsi+i- 
Since \V\ > [n/2j we can equate the sufRx of Ts-^^+i, call it U (which is labeled with a dotted line in Figure 
[5]), with Tso+i- So we have Tg-^+i — VU. Similarly we can set S e A* such that Ts^+i — US. Notice now 
that \U\ = n — {ri — ro) < ln/2\ . Thus S cannot begin any image word of h so the overlap is impossible. 

A note for the case when ri — ro < ln/2\. In this case \V\ < [n/2j and we would not be able to equate 

U. 

Figure [5] gives the case when ri < ro with rp — ri < \ n/2\. In a similar manner to the case where ro < ri 
we pick V to be the sufhx of ro — ri letters in T^^. Now we can possibly find an image word Tg^+i — VU 
for some U & A* . But because \V\ — ro — ri < [n/2\, U cannot be the prefix of any image word, so this 
formulation of the overlap is impossible. 
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Figure 6. The long overlap with ri < tq and tq — ri < [ri/2j 



So we see that in order for h{W) to contain an overlap, it must be one so that \cX\ = (mod n). 

4.2.2. The \cX\ = (mod n) case. From Lemma [4.21 we know that the beginning letters and ending letter 
for each image word in h must be distinct. Further we know that the sufRx of T^-^ must be identical to the 
sufhx of as tq = ri. This implies that = Ts^. Similarly Tg^ = Ts^- 

Pick d to be the letter such that h{d) = Tg^ = Ts^ = Tg^ = T. Further because 

h{W) = Acj„Xcj^Xcj^XB, 

we can find subwords C,D,Y of W so that 

h{W) h{CdYdYdD) = AcXcXcB. 

Now we must have that W = CdYdYdD which contains an overlap. Thus we are done. 

□ 

5. The Square-Free Adaptation 

Similarly to the definition of the overlap-free morphisms with unstackable image words we can define 
square-free morphisms with unstackable image words in the following manner. 

Definition 5.1. Let ft, : E* — > A* be an rt-uniform morphism. We call h a square-free morphism with 
unstackable image words if it satisfies the following properties: 

(i) h{W) is square-free for all square-free words G S* with \W\ = 3 

(ii) h(a) and h(b) do not begin or end with the same letter for all a, & G S with a ^ b. 

(iii) For a, 6 e E, and for all F e E* such that \V\ < [n/2j , 

h{a) = SV and h{b) = VU 
if and only if S is not a sufhx of any image word of h and U is not a prefix of any image word of h. 

Because we cannot consider words like aab to put into h, we must add property (ii) in Definition 15.11 
so that we can use a similar preimage argument in the final portion of the argument. Thus we have the 
following theorem. 

Theorem 5.2. Any square-free morphism with unstackable image words is square-free. 

Proof. Assume that h is a square-free morphism with unstackable image words such that |/i(ci)| = n for all 
a € E. We must show that for some W € Y,* , W is square-free if and only if h(W) is square-free. We will 
begin with the easy direction. 

5.1. The <^ direction. We will proceed by contrapositive. So assume that W = AXXB, where X G E+ 
and e E*. Write 

h{W) = h{AXXB) = h{A)h{X)h{X)h{B), 
which contains the square h[X)h{X). So we are done with this direction. 
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5.2. The direction. Again we proceed by arguing the contrapositive. So we assume that 

(2) h{W)^ Ac,„Xd,„Cj,Xd,,B, 

where c — Cj„ = Cj-^ G S, d = di„ = di-^ G S and A,X,B G S*. Note that we are using Cj^ and Cj-^ so that we 
can mark the beginning of the square, and similarly for the d's and the end of the square. 

There are two cases to consider here \cXd\ ^ (mod n) and \cXd\ = (mod n). We show that it is 
impossible for \cXd\ ^ (mod n) in an analogous manner as in Theorem 14.31 as seen in section 5.1.2.1. 

So assume that \cXd\ = (mod n). From the definition of the pooh square- free morphism we know that 
each image word for h must begin and end with distinct letters. Further we know the sufRx of the tile 
containing Cj^ must be identical to the suffix of the tile containing Cj^ . Thus they are the same image word, 
call it h{z) for some z G S. So because 

h{W) = AcXcXB, 
we can find subwords C, I?, F of such that 

h{W) = h{CzYzYD) = AcXcXB. 
Thus we have that W — AzXzXB which contains a square, and we are done. □ 
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